Identify Temporal Websites Based on User Behavior Analysis

نویسندگان

  • Yong Wang
  • Yiqun Liu
  • Min Zhang
  • Shaoping Ma
  • Liyun Ru
چکیده

The web is growing at a rapid speed and it is almost impossible for a web crawler to download all new pages. Pages reporting breaking news should be stored into search engine index as soon as they are published, while others whose content is not time-related can be left for later crawls. We collected and analyzed into users’ page-view data of 75,112,357 pages for 60 days. Using this data, we found that a large proportion of temporal pages are published by a small number of web sites providing news services, which should be crawled repeatedly with small intervals. Such temporal web sites of high freshness requirements can be identified by our algorithm based on user behavior analysis in page view data. 51.6% of all temporal pages can be picked up with a small overhead of untemporal pages. With this method, web crawlers can focus on these web sites and download pages from them with high priority.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis.    Methods: The method of this research is log anal...

متن کامل

A New Trust Model for B2C E-Commerce Based on 3D User Interfaces

Lack of trust is one of the key bottle necks in e-commerce development. Nowadays many advanced technologies are trying to address the trust issues in e-commerce. One among them suggests using suitable user interfaces. This paper investigates the functionality and capabilities of 3D graphical user interfaces in regard to trust building in the customers of next generation of B2C e-commerce websit...

متن کامل

ارزیابی کیفیت وب سایت­های فارسی حوزه افسردگی براساس مقیاس وب مد کوال

  Introduction: Nowadays, anyone with any knowledge of the Internet environment can act as producer and distributer of information. It differs from most traditional media of information transmission, lack of information control and lack of quality management to contents. This leads to quality of health information on the internet is doubtful. The objective of this study is guidance patients to ...

متن کامل

TAM2-based Study of Website User Behavior—Using Web 2.0 Websites as an Example

In recent years, we have seen a return of web-based applications built with new ideas and new commercial models. The key momentum for the development of such applications is the Web 2.0 technology. Web 2.0 websites are dynamic and characterized by user interaction, sharing, and participation. The emergence of this new business model brings new business opportunities. In fact, website users are ...

متن کامل

Analysis of Usage Patterns in Large Multimedia Websites

User behavior in a website is a critical indicator of the web site’s usability and success. Therefore an understanding of usage patterns is essential to website design optimization. In this context, large multimedia websites pose a significant challenge for comprehension of the complex and diverse user behaviors they sustain. This is due to the complexity of analyzing and understanding user-dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008